Sampling Based Fitted Value Iteration Finite-Time Bounds for Sampling-Based Fitted Value Iteration

نویسندگان

Rémi Munos

Csaba Szepesvári

چکیده

In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision problems (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted L-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation capabilities of the function space, the inherent Bellman residual, which reflects how well the function space is “aligned” with the essential characteristics of the dynamics of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite-Time Bounds for Fitted Value Iteration

متن کامل

Sampling-based Approximations with Quantitative Performance for the Probabilistic Reach-Avoid Problem over General Markov Processes

This article deals with stochastic processes endowed with the Markov (memoryless) property and evolving over general (uncountable) state spaces. The models further depend on a non-deterministic quantity in the form of a control input, which can be selected to affect the probabilistic dynamics. We address the synthesis of optimal controllers that maximize the probability associated to a rather g...

متن کامل

Tutorial notes on Sampling Based Fitted Dynamic Programming Finite-Time Bounds for Sampling-Based Fitted Dynamic Programming

متن کامل

Approximate modified policy iteration and its application to the game of Tetris

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...

متن کامل

Fitted Q-iteration in continuous action-space MDPs

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorou...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Sampling Based Fitted Value Iteration Finite-Time Bounds for Sampling-Based Fitted Value Iteration

نویسندگان

چکیده

منابع مشابه

Finite-Time Bounds for Fitted Value Iteration

Sampling-based Approximations with Quantitative Performance for the Probabilistic Reach-Avoid Problem over General Markov Processes

Tutorial notes on Sampling Based Fitted Dynamic Programming Finite-Time Bounds for Sampling-Based Fitted Dynamic Programming

Approximate modified policy iteration and its application to the game of Tetris

Fitted Q-iteration in continuous action-space MDPs

عنوان ژورنال:

اشتراک گذاری